Skip to content

Add CreateCallbackAsync and WaitForCallbackAsync (DOTNET-8660)#2373

Draft
GarrettBeatty wants to merge 1 commit into
gcbeatty/durable-child-contextfrom
gcbeatty/durable-callbacks
Draft

Add CreateCallbackAsync and WaitForCallbackAsync (DOTNET-8660)#2373
GarrettBeatty wants to merge 1 commit into
gcbeatty/durable-child-contextfrom
gcbeatty/durable-callbacks

Conversation

@GarrettBeatty
Copy link
Copy Markdown
Contributor

@GarrettBeatty GarrettBeatty commented May 14, 2026

#2216

Fixes DOTNET-8660. Stacked on top of #2372 (Wave 0 cross-cutting types).

What

Adds callback support to Amazon.Lambda.DurableExecution. A workflow can now hand a service-allocated CallbackId to an external system (a queue consumer, a human-approval UI, a long-running job runner) and suspend until that system reports back via the durable execution service. Two entry points: a low-level handle (CreateCallbackAsync) and the common "submit + wait" composition (WaitForCallbackAsync).

Public API:

Type Purpose
IDurableContext.CreateCallbackAsync<T>(...) Allocate a callback; returns an ICallback<T> handle. Errors are deferred to GetResultAsync so user code between create and await runs deterministically across replays.
IDurableContext.WaitForCallbackAsync<T>(...) Composite: CreateCallback + submitter step + GetResultAsync inside a child context. Common path for "submit job, wait for completion".
ICallback<T> Handle exposing CallbackId (give to the external system) and GetResultAsync (suspends until completion).
IWaitForCallbackContext Logger-only context passed to the submitter delegate. Distinct from IStepContext so the submitter API can evolve independently.
CallbackConfig Timeout + HeartbeatTimeout. Sub-second positive values are rejected (service timer granularity is 1s); TimeSpan.Zero disables.
WaitForCallbackConfig : CallbackConfig Adds RetryStrategy for the submitter step.
CallbackException (base) + CallbackFailedException, CallbackTimeoutException, CallbackSubmitterException Subclass tree so catch clauses can pattern-match the failure mode. Carries CallbackId, ErrorType, ErrorData, OriginalStackTrace.

Both APIs read the ILambdaSerializer from ILambdaContext.Serializer (typically registered via LambdaBootstrapBuilder.Create(handler, serializer)) and throw InvalidOperationException if no serializer is registered. AOT and reflection-based scenarios share a single overload — the AOT story is determined entirely by the registered serializer (e.g., SourceGeneratorLambdaJsonSerializer<TContext> for AOT).

How

Internal/CallbackOperation<T> mirrors the Step/Wait pattern from #2360 and the child-context pattern from #2370:

  • Fresh execution. Synchronously flushes a CALLBACK START checkpoint. The service stamps a freshly-allocated CallbackId onto the response; LambdaDurableServiceClient gains an onNewOperations hook so that ID flows back into ExecutionState during the START flush, where the operation can read it. The handle is returned immediately — CreateCallbackAsync always succeeds.
  • GetResultAsync suspends. On the invocation that first reaches the await, the workflow hits Termination.SuspendAndAwait and Lambda exits. When the external system delivers a result, the service re-invokes; replay observes the terminal checkpoint and returns (or throws) immediately.
  • Replay. SUCCEEDED returns the cached value (deserialized via the registered ILambdaSerializer). FAILED throws CallbackFailedException. TIMED_OUT throws CallbackTimeoutException. STARTED / PENDING re-suspend (external system hasn't responded yet). Any other status throws NonDeterministicExecutionException.
  • Deferred error propagation. Terminal status observed during Start/Replay is stashed on _terminalReplay and only resolved inside GetResultAsync. This keeps CreateCallbackAsync deterministically successful, so user code between create and await sees the same control flow on fresh execution and replay.
  • WaitForCallbackAsync composes RunInChildContextAsync (from Add RunInChildContextAsync #2370) + CreateCallbackAsync + a submitter StepAsync + GetResultAsync. The child-context wrapper gives a clean observability boundary (SubType = WaitForCallback) and a single error-mapping site: submitter step failures surface as CallbackSubmitterException; callback failures/timeouts preserve their subclass through child-context replay (a CallbackTimeoutException thrown inside the child remains a CallbackTimeoutException after the parent CONTEXT-FAILED replay).

Testing

42 new unit tests across CallbackOperationTests, WaitForCallbackTests, DurableFunctionTests, and ExceptionsTests:

  • Fresh execution + sync-flush of CALLBACK START (CallbackId stamped onto state).
  • Suspend on first GetResultAsync; replay returns cached value without re-running.
  • Terminal-state replay: SUCCEEDED deserializes, FAILED throws CallbackFailedException, TIMED_OUT throws CallbackTimeoutException.
  • STARTED/PENDING replay re-suspends; unknown status throws NonDeterministicExecutionException.
  • CreateCallbackAsync is always successful even for terminal-state replays (deferred error propagation).
  • CallbackConfig.Timeout / HeartbeatTimeout validation: rejects negative and sub-second positive values, accepts TimeSpan.Zero.
  • WaitForCallbackAsync: submitter receives the CallbackId; submitter failure (after retries exhausted) surfaces as CallbackSubmitterException; callback failure/timeout subclass survives parent CONTEXT-FAILED replay; happy-path returns the deserialized result.
  • Exception type hierarchy + property serialization round-trip.

5 new integration tests (require AWS credentials to run):
CreateCallbackHappyPath, CallbackTimeout, CallbackFailed, WaitForCallbackHappyPath, WaitForCallbackSubmitterFails. Each ships a deployable test function + Dockerfile under IntegrationTests/TestFunctions/.

203/203 unit tests pass on net8.0 and net10.0 (161 base + 42 new). Production build clean: 0 warnings, TreatWarningsAsErrors enforced.

Out of scope (follow-up PRs)

  • InvokeAsync / MapAsync / ParallelAsync / WaitForConditionAsync
  • DefaultJsonCheckpointSerializer
  • DurableLogger replay-suppression (currently NullLogger)
  • Annotations source-generator integration / [DurableExecution] attribute
  • DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package
  • dotnet new lambda.DurableFunction blueprint


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from 464c591 to d308c3b Compare May 14, 2026 21:49
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch 2 times, most recently from 951fcd1 to 1c88461 Compare May 14, 2026 22:19
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from d308c3b to be4c3ad Compare May 18, 2026 15:23
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from 1c88461 to 5cc9a04 Compare May 18, 2026 15:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch 3 times, most recently from ad4d208 to 3acbed5 Compare May 20, 2026 17:46
Base automatically changed from gcbeatty/durable-wave0 to gcbeatty/durable-child-context May 20, 2026 17:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch 3 times, most recently from 0d5a1f9 to fc5dbbd Compare May 20, 2026 18:12

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty added the Release Not Needed Add this label if a PR does not need to be released. label May 20, 2026
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-callbacks branch from fc5dbbd to f59dba9 Compare May 20, 2026 18:42
Adds callback support to the .NET Durable Execution SDK. CreateCallbackAsync
returns an ICallback<T> handle (CallbackId + GetResultAsync) that suspends
the workflow until an external system delivers a result via the durable
execution service. WaitForCallbackAsync composes CreateCallback + a
submitter step + GetResultAsync inside a child context for the common
"submit and wait" pattern.

Public surface:
- IDurableContext.CreateCallbackAsync<T> (single overload)
- IDurableContext.WaitForCallbackAsync<T> (single overload)
- ICallback<T> with CallbackId and GetResultAsync
- IWaitForCallbackContext (Logger only) for submitter functions
- CallbackConfig (Timeout + HeartbeatTimeout, validates sub-second values)
- WaitForCallbackConfig : CallbackConfig adds RetryStrategy
- Exception subclass tree: CallbackException base + CallbackFailedException,
  CallbackTimeoutException, CallbackSubmitterException

Both APIs read the ILambdaSerializer from ILambdaContext.Serializer
(typically registered via LambdaBootstrapBuilder.Create(handler, serializer))
and throw InvalidOperationException if no serializer is registered. AOT and
reflection-based scenarios share a single overload — the AOT story is
determined by the registered serializer.

Internal:
- CallbackOperation<T> handles fresh execution sync-flush of START with
  service-allocated CallbackId, deferred error propagation, and replay
  for SUCCEEDED/FAILED/TIMED_OUT/STARTED/PENDING. Unknown statuses throw
  NonDeterministicExecutionException.
- LambdaDurableServiceClient gains an onNewOperations callback so the
  freshly-allocated CallbackId from NewExecutionState flows back into
  ExecutionState during the START flush.
- WaitForCallback's error mapping preserves subclass fidelity on
  parent-CONTEXT-FAILED replay (CallbackTimeoutException remains
  CallbackTimeoutException, etc.).

Adds unit tests + integration tests covering happy path, timeout,
failure, submitter failure, replay determinism, and replay of each
exception subtype.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class callback support to the Amazon.Lambda.DurableExecution .NET SDK, enabling workflows to pause until an external system completes a callback, and providing a convenience “submit + wait” composite API.

Changes:

  • Introduces CreateCallbackAsync<T> / ICallback<T> for durable callback handles and result retrieval.
  • Adds WaitForCallbackAsync<T> with WaitForCallbackConfig + IWaitForCallbackContext, including retry wiring and error remapping.
  • Extends checkpoint plumbing to merge service-returned NewExecutionState.Operations into in-memory ExecutionState, plus broad unit/integration test coverage and design-doc updates.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Libraries/test/Amazon.Lambda.DurableExecution.Tests/WaitForCallbackTests.cs New unit tests covering WaitForCallbackAsync behavior, naming, determinism, and exception mapping.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/RecordingBatcher.cs Adds a flush hook used by tests to simulate service-side state updates (e.g., CallbackId allocation).
Libraries/test/Amazon.Lambda.DurableExecution.Tests/MockLambdaClient.cs Adds a customizable checkpoint response handler for tests modeling NewExecutionState behavior.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/ExceptionsTests.cs Adds ctor/property tests for new callback exception types.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/DurableFunctionTests.cs Adds end-to-end unit tests through DurableFunction.WrapAsync for callback allocation/replay determinism.
Libraries/test/Amazon.Lambda.DurableExecution.Tests/CallbackOperationTests.cs New unit tests covering callback operation start/replay/result/error behavior and serializer requirements.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitForCallbackSubmitterFailsTest.cs New integration test validating submitter failure surfaces as CallbackSubmitterException.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/WaitForCallbackHappyPathTest.cs New integration test validating two-Lambda “external system” happy path for WaitForCallbackAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/WaitForCallbackSubmitterFailsFunction.csproj New test function project used by integration tests.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/Function.cs Workflow test function that intentionally fails the submitter.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackSubmitterFailsFunction/Dockerfile Container packaging for the submitter-failure integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/WaitForCallbackHappyPathFunction.csproj New test function project for happy-path WaitForCallback integration.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/Function.cs Workflow test function that invokes an external approver Lambda and waits for callback completion.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/WaitForCallbackHappyPathFunction/Dockerfile Container packaging for the happy-path workflow function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/Function.cs Workflow test function that uses CreateCallbackAsync and suspends on GetResultAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/Dockerfile Container packaging for CreateCallback happy-path integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CreateCallbackHappyPathFunction/CreateCallbackHappyPathFunction.csproj New CreateCallback integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/Function.cs Workflow test function validating callback timeouts via CallbackConfig.Timeout.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/Dockerfile Container packaging for callback-timeout integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackTimeoutFunction/CallbackTimeoutFunction.csproj New callback-timeout integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/Function.cs Workflow test function validating callback failure delivery via SendDurableExecutionCallbackFailure.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/Dockerfile Container packaging for callback-failure integration test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/CallbackFailedFunction/CallbackFailedFunction.csproj New callback-failure integration test function project.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/Function.cs External “approver” Lambda that completes callbacks via SendDurableExecutionCallbackSuccess.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/Dockerfile Container packaging for external approver function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ApproverFunction/ApproverFunction.csproj New external approver function project (includes AWSSDK.Lambda).
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/DurableFunctionDeployment.cs Enhances deployment helper to optionally deploy a paired external Lambda; adds cross-process build locking.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CreateCallbackHappyPathTest.cs New integration test for CreateCallback success delivery.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CallbackTimeoutTest.cs New integration test for callback timeout surface/type recording.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/CallbackFailedTest.cs New integration test for callback failure surface/type recording.
Libraries/src/Amazon.Lambda.DurableExecution/WaitForCallbackConfig.cs Adds configuration type for WaitForCallback (inherits callback timeouts + submitter retry strategy).
Libraries/src/Amazon.Lambda.DurableExecution/Services/LambdaDurableServiceClient.cs Adds onNewOperations callback to checkpointing and maps callback ops from NewExecutionState.
Libraries/src/Amazon.Lambda.DurableExecution/Operation.cs Adds WaitForCallback subtype constant and TIMED_OUT status constant.
Libraries/src/Amazon.Lambda.DurableExecution/IWaitForCallbackContext.cs New submitter-context interface (logger-only) for WaitForCallbackAsync.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/LambdaSerializerHelper.cs Centralizes “serializer required” enforcement and message.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/CallbackOperation.cs Implements durable callback operation semantics and ICallback<T> handle behavior.
Libraries/src/Amazon.Lambda.DurableExecution/IDurableContext.cs Adds public API surface for CreateCallbackAsync<T> and WaitForCallbackAsync<T>.
Libraries/src/Amazon.Lambda.DurableExecution/ICallback.cs Introduces public callback handle interface (CallbackId + GetResultAsync).
Libraries/src/Amazon.Lambda.DurableExecution/DurableFunction.cs Wires serializer helper and merges NewExecutionState operations into ExecutionState during checkpointing.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Implements CreateCallbackAsync and WaitForCallbackAsync composition + error mapping.
Libraries/src/Amazon.Lambda.DurableExecution/CallbackException.cs Adds callback exception hierarchy (CallbackException + Failed/Timeout/Submitter).
Libraries/src/Amazon.Lambda.DurableExecution/CallbackConfig.cs Adds timeout + heartbeat timeout config with sub-second validation.
Docs/durable-execution-design.md Updates design doc to reflect callback APIs, contexts, and exception hierarchy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Error = sdkOp.CallbackDetails.Error != null ? new ErrorObject
{
ErrorType = sdkOp.CallbackDetails.Error.ErrorType,
ErrorMessage = sdkOp.CallbackDetails.Error.ErrorMessage
Comment on lines +163 to +179
public async Task<T> GetResultAsync(CancellationToken cancellationToken = default)
{
cancellationToken.ThrowIfCancellationRequested();

// Terminal-state checkpoint already observed by Start/Replay — return
// (or throw) immediately without suspending.
if (_terminalReplay != null)
{
return ResolveTerminal(_terminalReplay);
}

// No terminal state yet. Suspend the workflow; the service re-invokes
// when the external system delivers a result.
return await Termination.SuspendAndAwait<T>(
TerminationReason.CallbackPending,
$"callback:{Name ?? OperationId}");
}
Comment on lines +233 to +243
private SdkCallbackOptions? BuildCallbackOptions()
{
if (_config == null) return null;
if (_config.Timeout == TimeSpan.Zero && _config.HeartbeatTimeout == TimeSpan.Zero) return null;

var options = new SdkCallbackOptions();
if (_config.Timeout > TimeSpan.Zero)
options.TimeoutSeconds = (int)Math.Max(1, Math.Ceiling(_config.Timeout.TotalSeconds));
if (_config.HeartbeatTimeout > TimeSpan.Zero)
options.HeartbeatTimeoutSeconds = (int)Math.Max(1, Math.Ceiling(_config.HeartbeatTimeout.TotalSeconds));
return options;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Release Not Needed Add this label if a PR does not need to be released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants